Skip to content

Claude/setup pip install k2 m4j#4

Merged
igerber merged 2 commits intomainfrom
claude/setup-pip-install-k2M4j
Jan 2, 2026
Merged

Claude/setup pip install k2 m4j#4
igerber merged 2 commits intomainfrom
claude/setup-pip-install-k2M4j

Conversation

@igerber
Copy link
Copy Markdown
Owner

@igerber igerber commented Jan 2, 2026

No description provided.

claude added 2 commits January 2, 2026 12:16
Update license field from deprecated table format to simple string format
to comply with modern setuptools standards and eliminate deprecation warnings.
Changed from SPDX string format back to {text = "MIT"} format for
compatibility with current PyPI infrastructure which does not yet
support the License-Expression metadata field.
@igerber igerber merged commit 00d26c2 into main Jan 2, 2026
@igerber igerber deleted the claude/setup-pip-install-k2M4j branch January 3, 2026 12:52
igerber pushed a commit that referenced this pull request Jan 4, 2026
Review fixes:
- Add edge case validation in _compute_flci (se > 0, 0 < alpha < 1)
- Improve significance_stars docstring explaining partial identification
- Standardize error messages to include parameter values (M, Mbar, alpha)
- Make LP solver method configurable in _solve_bounds_lp
- Add clarifying comment about constraint matrix design for pre+post periods
- Improve CallawaySantAnna error message with actionable guidance

Notes:
- #4 (sensitivity_plot export) was verified as valid - function exists at
  honest_did.py:1437
- #1 (pre-period effects) verified correct - LP optimization covers all
  periods but only post-periods contribute to objective function
igerber pushed a commit that referenced this pull request Jan 4, 2026
Revised review reflects:
- #1, #4 verified as non-issues (correct by design)
- #3, #5, #6, #8, #13 addressed in commit e40d6b4
- Updated recommendation to approve and merge
- Remaining items are low-priority style suggestions for future PRs
igerber added a commit that referenced this pull request Apr 19, 2026
Phase 2 silent-failures audit — axis-G (backend parity). Closes the
coverage gap the audit flagged in three Rust-backed solver surfaces.
Test-only PR; any discovered divergences are marked `xfail(strict=True)`
and logged to `TODO.md` as P1 follow-ups rather than fixed in-scope.

Finding #21 — `solve_ols` skip-rank-check parity (`linalg.py:369-373,
597-639`): three parity tests in `TestSolveOLSSkipRankCheckParity`
covering mixed-scale columns (norm ratio > 1e6), near-singular full-rank
(cond > 1e10), and rank-deficient collinear designs under
`skip_rank_check=True` on HC1. Backends agree on fitted values within
`rtol=1e-6, atol=1e-8`. All pass; no Rust-side code change needed.

Finding #22 — `compute_synthetic_weights` parity (`utils.py:1134-1199`):
three parity tests in `TestSyntheticWeightsBackendParity`. Near-singular
`Y'Y` passes at `atol=1e-7`; extreme Y scale (1e9) and lambda_reg
variations are `xfail(strict=True)` with a baselined ~15-80% weight
divergence. Root cause: Rust path is Frank-Wolfe, Python fallback is
projected gradient descent (`utils.py:1228`) — same QP, different
simplex vertices under near-degenerate inputs.

Finding #23 — TROP Rust grid-search + bootstrap parity
(`trop_global.py:688-750, 966-1006`): two parity tests in
`TestTROPRustEdgeCaseParity`, `@pytest.mark.slow` class-level. Both
`xfail(strict=True)`: grid-search ATT on rank-deficient Y (~6%
divergence), bootstrap SE under `seed=42` (~28% divergence, RNG
backend mismatch — Rust `rand` crate vs numpy `default_rng`).

Plan governance:
- Per `feedback_ci_reviewer_pattern_checks`, greped adjacent Rust
  entry points (`_solve_ols_rust`, `_rust_synthetic_weights`,
  `_rust_loocv_grid_search_global`, `_rust_bootstrap_trop_variance_global`);
  no additional silent-fallback surfaces identified.
- Per plan Non-goal #4, did not open an axis-H finding on TROP's
  `seed=None → 0` substitution at `trop_global.py:994` (out of scope).
- No behavioral changes, no warnings, no REGISTRY changes, no flags.

TODO.md logs three P1 follow-up entries: algorithmic unification for
`compute_synthetic_weights` (FW vs PGD), TROP grid-search divergence on
rank-deficient Y, TROP bootstrap RNG unification.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
igerber added a commit that referenced this pull request Apr 19, 2026
Closes BR/DR foundation gap #4 (real-dataset validation) from the
external-positioning gap list in ``project_br_dr_foundation.md``.

Validation artifact:

- ``docs/validation/validate_br_dr_canonical.py`` runs BusinessReport
  / DiagnosticReport on Card-Krueger (1994), mpdta (Callaway-Sant'Anna
  2021 benchmark), and Castle Doctrine (Cheng-Hoekstra 2013 under
  both CS and SA), dumping summary + full_report + selected to_dict
  blocks for each.
- ``docs/validation/br_dr_canonical_validation.md`` is the regenerable
  raw output.
- ``docs/validation/br_dr_canonical_findings.md`` is the hand-written
  synthesis: direction / verdict / sensitivity tier all match canonical
  interpretations, with two small wording bugs surfaced and fixed in
  this PR and two larger gaps queued as follow-up (SA HonestDiD
  applicability, target-parameter disambiguation).

Wording fixes:

1. Treatment-label capitalization. ``str.capitalize()`` lowercased
   every character after the first, flattening embedded abbreviations
   (``"the NJ minimum-wage increase"`` → ``"The nj minimum-wage
   increase"``) and proper-noun phrases (``"Castle Doctrine law
   adoption"`` → ``"Castle doctrine law adoption"``). Replaced with a
   ``_sentence_first_upper`` helper that preserves user-supplied
   casing.

2. ``breakdown_M == 0`` phrasing. The HonestDiD fragile sentence
   quoted ``{breakdown_M:.2g}x the pre-period variation``, which
   renders as a degenerate ``0x`` on the exact-zero case surfaced by
   Cheng-Hoekstra. At ``breakdown_M <= 0.05`` (covers 0 and near-zero
   values), both BR's summary and DR's overall_interpretation now say
   "includes zero even at the smallest parallel-trends violations on
   the sensitivity grid" instead.

Tests: 5 new regressions in
``TestCanonicalValidationSurfaceFixes`` covering both fixes + three
boundary cases (exact zero, small positive, normal fragile value).

Not in scope: Favara-Imbs (dCDH reversible-treatment dataset not
bundled), ImputationDiD / TwoStageDiD on canonical data (needed to
exercise the R42 untreated-outcome FE assumption branch on real
data), SA HonestDiD applicability gap. All tracked in the findings
doc for follow-up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
igerber added a commit that referenced this pull request Apr 20, 2026
Close BR/DR gap #4: canonical-dataset regression guards + wording fixes
igerber added a commit that referenced this pull request Apr 25, 2026
…nuousDiD prerequisite list as profile-side screening + add first_treat caveat

P1 (the five profile-derived facts are not the "full" gate set):
Reviewer correctly noted that calling
`{has_never_treated, treatment_varies_within_unit==False,
is_balanced, no duplicate_unit_time_rows alert, dose_min > 0}` the
"full ContinuousDiD pre-fit gate set" overreaches. `profile_panel`
only sees the four columns it accepts and CANNOT see the separate
`first_treat` column that `ContinuousDiD.fit()` consumes. Verified
against `continuous_did.py:230-360`: `fit()` additionally rejects
NaN/inf/negative `first_treat`, drops units with `first_treat > 0`
AND `dose == 0`, and force-zeroes `first_treat == 0` rows whose
`dose != 0` with a `UserWarning`. A panel that passes all five
profile-side checks can still surface warnings, drop rows, or raise
at fit time depending on the `first_treat` column the caller
supplies.

Reframed the wording in five surfaces from "full gate set" to
"profile-side screening checks" with an explicit caveat that the
checks are necessary-but-not-sufficient and that `ContinuousDiD.fit()`
applies separate `first_treat` validation:

- `diff_diff/profile.py` `TreatmentDoseShape` docstring (now spells
  out the screening framing explicitly + lists the `first_treat`
  validations that fit() applies).
- `diff_diff/profile.py` `_compute_treatment_dose` helper docstring
  (aligned with public contract: most fields descriptive,
  `dose_min > 0` is one of the screening checks).
- `diff_diff/guides/llms-autonomous.txt` §2 field reference (rewrote
  the multi-paragraph block to describe screening + first_treat
  caveat).
- `diff_diff/guides/llms-autonomous.txt` §4.7 (continuous design
  feature paragraph: screening checks + necessary-not-sufficient
  language + pointer to §2).
- `diff_diff/guides/llms-autonomous.txt` §5.2 worked example
  reasoning chain (rewrote step 2 to call out screening +
  first_treat caveat; clarified counter-example #4 that
  `P(D=0) > 0` is required under BOTH `control_group="never_treated"`
  and `"not_yet_treated"`, not just default).
- `CHANGELOG.md` Unreleased entry.
- `ROADMAP.md` AI-Agent Track.

P2 (test coverage for the missing `first_treat` caveat):
Added a content-stability assertion in `tests/test_guides.py`:
`assert "first_treat" in text` so the autonomous guide cannot
silently drop the explicit `first_treat` validation caveat.

P3 (helper / test-name inconsistency with public contract):
Renamed `test_treatment_dose_does_not_gate_continuous_did` to
`test_treatment_dose_descriptive_fields_supplement_existing_gates`
and rewrote its docstring to match the now-honest public contract
("most fields descriptive distributional context that supplements
the existing top-level screening checks"). The test body still
asserts the same two things — `treatment_varies_within_unit` fires
True on `0,0,d,d` paths and `has_never_treated` is independent of
`has_zero_dose` — both of which remain accurate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
igerber added a commit that referenced this pull request Apr 25, 2026
…nuousDiD prerequisite list as profile-side screening + add first_treat caveat

P1 (the five profile-derived facts are not the "full" gate set):
Reviewer correctly noted that calling
`{has_never_treated, treatment_varies_within_unit==False,
is_balanced, no duplicate_unit_time_rows alert, dose_min > 0}` the
"full ContinuousDiD pre-fit gate set" overreaches. `profile_panel`
only sees the four columns it accepts and CANNOT see the separate
`first_treat` column that `ContinuousDiD.fit()` consumes. Verified
against `continuous_did.py:230-360`: `fit()` additionally rejects
NaN/inf/negative `first_treat`, drops units with `first_treat > 0`
AND `dose == 0`, and force-zeroes `first_treat == 0` rows whose
`dose != 0` with a `UserWarning`. A panel that passes all five
profile-side checks can still surface warnings, drop rows, or raise
at fit time depending on the `first_treat` column the caller
supplies.

Reframed the wording in five surfaces from "full gate set" to
"profile-side screening checks" with an explicit caveat that the
checks are necessary-but-not-sufficient and that `ContinuousDiD.fit()`
applies separate `first_treat` validation:

- `diff_diff/profile.py` `TreatmentDoseShape` docstring (now spells
  out the screening framing explicitly + lists the `first_treat`
  validations that fit() applies).
- `diff_diff/profile.py` `_compute_treatment_dose` helper docstring
  (aligned with public contract: most fields descriptive,
  `dose_min > 0` is one of the screening checks).
- `diff_diff/guides/llms-autonomous.txt` §2 field reference (rewrote
  the multi-paragraph block to describe screening + first_treat
  caveat).
- `diff_diff/guides/llms-autonomous.txt` §4.7 (continuous design
  feature paragraph: screening checks + necessary-not-sufficient
  language + pointer to §2).
- `diff_diff/guides/llms-autonomous.txt` §5.2 worked example
  reasoning chain (rewrote step 2 to call out screening +
  first_treat caveat; clarified counter-example #4 that
  `P(D=0) > 0` is required under BOTH `control_group="never_treated"`
  and `"not_yet_treated"`, not just default).
- `CHANGELOG.md` Unreleased entry.
- `ROADMAP.md` AI-Agent Track.

P2 (test coverage for the missing `first_treat` caveat):
Added a content-stability assertion in `tests/test_guides.py`:
`assert "first_treat" in text` so the autonomous guide cannot
silently drop the explicit `first_treat` validation caveat.

P3 (helper / test-name inconsistency with public contract):
Renamed `test_treatment_dose_does_not_gate_continuous_did` to
`test_treatment_dose_descriptive_fields_supplement_existing_gates`
and rewrote its docstring to match the now-honest public contract
("most fields descriptive distributional context that supplements
the existing top-level screening checks"). The test body still
asserts the same two things — `treatment_varies_within_unit` fires
True on `0,0,d,d` paths and `has_never_treated` is independent of
`has_zero_dose` — both of which remain accurate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
igerber added a commit that referenced this pull request Apr 25, 2026
…light checks as standard-workflow predictions, not estimator gates

Reviewer correctly noted that calling
{has_never_treated, treatment_varies_within_unit==False,
is_balanced, no duplicate_unit_time_rows alert, dose_min > 0}
the "screening checks" / "necessary" gates of `ContinuousDiD`
overstates the contract. `ContinuousDiD.fit()` keys off the
separate `first_treat` column (which `profile_panel` does not see),
defines never-treated controls as `first_treat == 0` rows,
force-zeroes nonzero `dose` on those rows with a `UserWarning`,
and rejects negative dose only among treated units `first_treat > 0`
(see `continuous_did.py:276-327` and `:348-360`).

Two of the five checks (`has_never_treated`, `dose_min > 0`) are
first_treat-dependent: agents who relabel positive- or negative-dose
units as `first_treat == 0` trigger the force-zero coercion path
with a `UserWarning` and may still fit panels that fail those
preflights, with the methodology shifting. The other three
(`treatment_varies_within_unit`, `is_balanced`, duplicate-row
absence) are real fit-time gates that hold regardless of how
`first_treat` is constructed.

Reframed every wording site to call these "standard-workflow
preflight checks" — predictive when the agent derives `first_treat`
from the same dose column passed to `profile_panel`, but not the
estimator's literal contract:

- `diff_diff/profile.py` `TreatmentDoseShape` docstring (rewrote
  the multi-paragraph block; explicit standard-workflow definition
  + per-check first_treat dependency map + force-zero coercion
  caveat).
- `diff_diff/profile.py` `_compute_treatment_dose` helper docstring
  (already brief; stays consistent).
- `diff_diff/guides/llms-autonomous.txt` §2 field reference (long
  rewrite covering the standard-workflow framing + override paths).
- `diff_diff/guides/llms-autonomous.txt` §4.7 opening bullet +
  trailing paragraph (both updated; opening bullet now spells out
  which of the five checks are first_treat-dependent vs. hard
  fit-time stops; trailing paragraph promotes the standard-
  workflow framing).
- `diff_diff/guides/llms-autonomous.txt` §5.2 reasoning chain step
  2 (rewrote the gate-checking paragraph; counter-example #4
  expanded to enumerate (a) supply matching first_treat and accept
  rejection, (b) deliberate relabel + coercion, (c) different
  estimator; counter-example #5 distinguishes negative-dose
  treated-unit rejection from never-treated coercion).
- `CHANGELOG.md` Wave 2 entry (matches the new framing).
- `ROADMAP.md` AI-Agent Track building block (matches).

Test coverage:
- Renamed assertion messages in
  `test_treatment_dose_descriptive_fields_supplement_existing_gates`
  and `test_treatment_dose_min_flags_negative_dose_continuous_panels`
  to remove "authoritative gate" phrasing; reframed as "standard-
  workflow preflight" assertions consistent with the corrected docs.
- Added `test_negative_dose_on_never_treated_coerces_not_rejects`
  in `tests/test_continuous_did.py::TestEdgeCases` covering the
  reviewer's specific request: never-treated rows with NEGATIVE
  nonzero dose must coerce (with `UserWarning`) rather than raise
  the treated-unit negative-dose error. Sister to the existing
  `test_nonzero_dose_on_never_treated_warns` which covers the
  positive-dose case.

Rebased onto origin/main during this round (no conflicts beyond
prior CHANGELOG resolutions; main advanced 19 commits).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
igerber added a commit that referenced this pull request Apr 25, 2026
…s-fallback wording; correct duplicate-row "fit-time stop" claim

P1 (relabel-to-manufacture-controls misframing):
Round 11 introduced wording across the guide, profile docstring,
CHANGELOG, ROADMAP, and test docstrings that presented intentional
`first_treat == 0` relabeling of nonzero-dose units as an
"option" / "fallback" for fitting `ContinuousDiD` when the
profile-side preflights (`has_never_treated`, `dose_min > 0`)
fail. REGISTRY does not document this as a routing option, and the
estimator still requires actual `P(D=0) > 0` because Remark 3.1
lowest-dose-as-control is not yet implemented. The force-zero
coercion at `continuous_did.py:311-327` is implementation behavior
for INCONSISTENT inputs (e.g., user accidentally passes nonzero
dose on a never-treated row), not a methodology fallback.

Reworded every site to remove the relabeling-as-option framing and
replace it with the registry-documented fixes when (1) or (5)
fails: re-encode the treatment column to a non-negative scale that
contains a true never-treated group, or route to a different
estimator (`HeterogeneousAdoptionDiD` for graded-adoption panels;
linear DiD with the treatment as a continuous covariate). Every
remaining "manufacture controls" mention in the guide, profile,
and tests is now an explicit anti-recommendation ("do not relabel
... to manufacture controls"). Updated:

- `diff_diff/profile.py` `TreatmentDoseShape` docstring (item (1):
  "not an opportunity to relabel ..."; item (5): coercion is
  "implementation behavior for inconsistent inputs, not a
  methodological fallback").
- `diff_diff/guides/llms-autonomous.txt` §2 field reference (the
  When-(1)-or-(5)-fails paragraph names re-encode + alternative
  estimator only; explicit anti-relabel warning).
- `diff_diff/guides/llms-autonomous.txt` §4.7 opening bullet +
  trailing paragraph (consolidated; opening bullet drops the
  relabel-as-fallback framing; trailing paragraph trimmed to a
  pointer to §2).
- `diff_diff/guides/llms-autonomous.txt` §5.2 step 2 + counter-
  example #4 + counter-example #5 (relabel-as-option language
  removed; explicit "do not relabel" callouts; counter-example #4
  options trimmed to (a) re-encode and (b) different estimator).
- `CHANGELOG.md` (relabel-as-option clause removed; replaced with
  re-encode / different-estimator framing).
- `ROADMAP.md` (same).
- `tests/test_profile_panel.py` two test docstrings (relabel-as-
  workflow language removed).

P2 (duplicate-row "hard fit-time stop" misclaim):
Round 11 wording said "duplicate-row failures are hard fit-time
stops" — incorrect. `_precompute_structures` at
`continuous_did.py:818-823` silently overwrites with last-row-wins,
no exception raised. Reworded as "hard preflight veto: the agent
must deduplicate before fit because `ContinuousDiD` otherwise uses
last-row-wins, no fit-time exception" in profile.py docstring,
guide §4.7 opening bullet, and §5.2 step 2 (now defers to §2 for
the breakdown). The previously-correct §2 description of the
silent-coerce path is preserved.

Length housekeeping:
The round-11 round-12 expansion pushed `llms-autonomous.txt`
above `llms-full.txt`, breaking `test_full_is_largest`. Trimmed
~2.7KB by consolidating the §4.7 trailing paragraph + §5.2 step 2
trailing block to point at §2's full breakdown rather than
duplicating the per-check semantics. autonomous: 65364 chars,
full: 66058 chars.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
igerber added a commit that referenced this pull request Apr 25, 2026
… first_treat from dose" framing; add PanelProfile backward-compat defaults; fix test_continuous_did docstring

P1 (canonical ContinuousDiD setup vs. derive-from-dose framing):
Round 12 introduced a "standard workflow" description across the
guide, profile docstring, CHANGELOG, ROADMAP, and test docstrings
that said agents derive `first_treat` from the same dose column
passed to `profile_panel`. Reviewer correctly noted this conflicts
with the actual ContinuousDiD contract (`continuous_did.py:222-228`,
`prep_dgp.py:970-993`, `docs/methodology/continuous-did.md:65-73`):
the canonical setup uses a **time-invariant per-unit dose** `D_i`
and a **separate `first_treat` column** the caller supplies — the
dose column has no within-unit time variation in this setup, so it
cannot encode timing. An agent following the rejected framing would
either build a `0,0,d,d` path (which `fit()` rejects) or keep a
valid constant-dose panel (in which case the dose column carries no
timing information).

Reworded every site to drop the derive-from-dose framing and
replace with the canonical setup. The five facts on the dose column
remain predictive of `fit()` outcomes BECAUSE the canonical
convention ties `first_treat == 0` to `D_i == 0` and treated units
carry their constant dose across all periods — so `has_never_treated`
proxies `P(D=0) > 0` and `dose_min > 0` predicts the strictly-
positive-treated-dose requirement, without any "derivation" of
`first_treat` from the dose column. Updated:

- `diff_diff/profile.py` `TreatmentDoseShape` docstring (rewrote
  the multi-paragraph block to use the canonical-setup framing
  and added an explicit "agent must validate `first_treat`
  independently" note).
- `diff_diff/guides/llms-autonomous.txt` §2 field reference.
- `diff_diff/guides/llms-autonomous.txt` §4.7 opening bullet.
- `diff_diff/guides/llms-autonomous.txt` §5.2 reasoning chain
  step 2 + counter-examples #4 and #5 (now describe the
  canonical setup rather than a derive-from-dose workflow).
- `CHANGELOG.md` Wave 2 entry.
- `ROADMAP.md` AI-Agent Track building block.
- `tests/test_profile_panel.py` `test_treatment_dose_min_flags
  _negative_dose_continuous_panels` docstring/comments.

P2 (PanelProfile direct-construction backward compat):
Wave 2 added `outcome_shape` and `treatment_dose` to PanelProfile
without defaults, breaking direct `PanelProfile(...)` calls that
predate Wave 2. Made both fields default to `None` (moved them to
the end of the field list; both are `Optional[...]`). Added
`test_panel_profile_direct_construction_without_wave2_fields`
asserting that direct construction without the new fields succeeds
and yields `None` defaults that serialize correctly through
`to_dict()`.

P3 (test_continuous_did.py docstring overstating sanction):
The new `test_negative_dose_on_never_treated_coerces_not_rejects`
docstring said the contract "lets agents legally relabel
negative-dose units as `first_treat == 0` to coerce them away."
Reworded as observed implementation behavior for inconsistent
inputs, NOT a sanctioned routing option — the test locks in the
coercion contract while the autonomous guide §5.2 explicitly tells
agents not to use this path methodologically.

Length invariant maintained: autonomous (65748 chars) < full
(66031 chars); `test_full_is_largest` still passes (compares
character count, not byte count, so on-disk size with UTF-8
multi-byte characters differs from the assertion target).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
igerber added a commit that referenced this pull request Apr 25, 2026
…fixes" overclaim for ContinuousDiD recoding

P1 (overclaiming registry endorsement of recoding):
Reviewer correctly noted the round-13/14 wording across the
public-facing surfaces called re-encoding the treatment column a
"registry-documented fix" / "documented option" / "documented
fallback". REGISTRY only documents the `P(D=0) > 0` requirement
and explicitly notes Remark 3.1's lowest-dose-as-control fallback
is NOT implemented in this library. Re-encoding is an agent-side
preprocessing choice that the registry neither endorses nor
forbids — calling it "registry-documented" was an over-claim.

Reworded twelve sites to drop the "documented" framing:
- `diff_diff/profile.py` `TreatmentDoseShape` docstring (items
  (1) and (5)).
- `diff_diff/guides/llms-autonomous.txt` §2 field reference
  When-(1)-or-(5)-fails paragraph.
- `diff_diff/guides/llms-autonomous.txt` §4.7 opening bullet
  trailing language.
- `diff_diff/guides/llms-autonomous.txt` §4.7 trailing paragraph
  (consolidated to a pointer at §2; reduced redundancy).
- `diff_diff/guides/llms-autonomous.txt` §5.2 reasoning chain
  counter-example #4.
- `tests/test_profile_panel.py` two test docstrings + one
  inline assertion message + one trailing comment.
- `CHANGELOG.md` Wave 2 entry.
- `ROADMAP.md` AI-Agent Track building block.

The corrected framing across all surfaces:
- Honestly state the contract: `ContinuousDiD` requires
  `P(D=0) > 0` and positive treated doses; Remark 3.1 not
  implemented.
- When the contract isn't met, say `ContinuousDiD` "as currently
  implemented does not apply" — not "do this fix."
- Mention routing alternatives that ARE in the library and DON'T
  require `P(D=0) > 0`: `HeterogeneousAdoptionDiD`, linear DiD
  with a continuous covariate. Those are routing facts, not
  methodology endorsements.
- Re-encoding stays in the prose as an "agent-side preprocessing
  choice that changes the estimand and is not documented in
  REGISTRY as a supported fallback" — explicitly NOT endorsed.

Length housekeeping: trimmed redundancy in the §4.7 trailing
paragraph (consolidated to a pointer at §2) and tightened the §2
recoding paragraph; autonomous (65984 chars) < full (66031),
`test_full_is_largest` green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
igerber added a commit that referenced this pull request Apr 25, 2026
…s "negative dose" branches; HAD only valid on the former

Reviewer correctly noted that the round-15/16 wording listed
`HeterogeneousAdoptionDiD` as a routing alternative whenever
`ContinuousDiD` fails on the dose-related preflights, but HAD
itself requires non-negative dose support and raises on negative
post-period dose at `had.py:1450-1459` (paper Section 2). On a
panel with `dose_min < 0`, routing to HAD silently steers an agent
into the same fit-time error. Verified the rejection at
`had.py:1450-1459`.

Reworded every site to split the two failure modes:

- Branch (a): `has_never_treated == False` (no zero-dose controls
  but all observed doses non-negative). `ContinuousDiD` does not
  apply (Remark 3.1 not implemented). HAD IS a routing alternative
  on this branch (HAD's contract requires non-negative dose,
  satisfied here); linear DiD with a continuous covariate is
  another.
- Branch (e): `dose_min < 0` (negative treated doses).
  `ContinuousDiD` does not apply AND HAD is **not** a fallback
  either — HAD raises on negative post-period dose
  (`had.py:1450-1459`). Linear DiD with a signed continuous
  covariate is the applicable alternative on this branch.

Updated wording across:
- `diff_diff/profile.py` `TreatmentDoseShape` docstring (refactored
  from item-by-item duplication into a numbered list with a single
  "Routing alternatives when (1) or (5) fails" section that splits
  the two branches; trimmed redundancy).
- `diff_diff/guides/llms-autonomous.txt` §2 field reference (split
  the When-(1)-or-(5)-fails paragraph into the two branches).
- `diff_diff/guides/llms-autonomous.txt` §4.7 trailing paragraph
  (consolidated to a pointer at §2's split discussion).
- `diff_diff/guides/llms-autonomous.txt` §5.2 reasoning chain
  counter-example #4 (no never-treated branch: HAD applies) and
  counter-example #5 (negative-dose branch: HAD does NOT apply,
  cite `had.py:1450-1459`).
- `CHANGELOG.md` Wave 2 entry.
- `ROADMAP.md` AI-Agent Track building block.
- `tests/test_profile_panel.py` two test docstrings/comments.

Added `test_autonomous_negative_dose_path_does_not_route_to_had`
in `tests/test_guides.py` asserting that §5.2 explicitly cites
`had.py:1450-1459` on the negative-dose branch (used a single-
line fingerprint since the prose phrase "non-negative dose
support" is split across newlines in the rendered guide).

Length housekeeping: trimmed counter-example #4 and #5 prose +
§4.7 trailing paragraph to point at §2's split discussion;
autonomous (65374 chars) < full (66031), `test_full_is_largest`
green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants